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A microscopic theory of the free energy barriers and folding routes for minimally frustrated proteins 
is presented, greatly expanding on the presentation of the variational approach outlined previously 
[J. J. Portman, S. Takada, P. G. Wolynes, Phys. Rev. Lett. 81, 5237 (1998)]. We choose the A- 
repressor protein as an illustrative example and focus on how the polymer chain statistics influence 
free energy profiles and partially ordered ensembles of structures. In particular, we investigate the 
role of chain stiffness on the free energy profile and folding routes. We evaluate the applicability of 
simpler approximations in which the conformations of the protein molecule along the folding route 
are restricted to have residues that are either entirely folded or unfolded in contiguous stretches. We 
find that the folding routes obtained from only one contiguous folded region corresponds to a chain 
with a much greater persistence length than appropriate for natural protein chains, while the folding 
route obtained from two contiguous folded regions is able to capture the relatively folded regions 
calculated within the variational approach. The free energy profiles obtained from the contiguous 
sequence approximations have larger barriers than the more microscopic variational theory which is 
understood as a consequence of partial ordering. 



I. INTRODUCTION 

Considerable progress has been made in describing pro- 
tein folding using equilibrium and nonequilibrium statis- 
tical mechanics, but a complete formal microscopic ki- 
netic theory has only been sketched. The primary novel 
features of the modern theory ftfi-folding revolve around 
two somewhat different themesEru — the glassy dynam- 
ics expected for m(j)at,heteropolymers whose sequence is 
chosen at random,Bl3 and the organized dynamics ex- 
pected for proteins selected by cyplption to fold quickly 
on a funneled energy landscapeJUia The establishment, 
through selection, |-pf a funneled landscape entails mini- 
mizing frustrationEJ — the conflict between different en- 
ergy contributions in a random sequence. Once a fun- 
neled landscape is established by selection, however, it 
is the interplay between entropy and guiding energies of 
the funnel that figure most prominently in determining 
the observed kinetics. If we neglect sidechain and solvent 
degrees of freedom, the entropy depends crucially on the 
polymer physics of the protein chain. The statistical me- 
chanical theory of both the stable states and some transi- 
tion states has already been outlined.Ej Ej In this paper 
and its companionHI we show how the existing framework 
can be extended to yield a complete microscopic theory 
of folding rates of completely minimally frustrated pro- 
teins. Microscopic calculations of transition state ensem- 
bles, activation free energies, and dynamical pre-factors 
involving chain motions can be obtained. A brief repcpt 
on the early progress of this work has already appeared,t3 
but here we fill in the details and also explore some in- 
teresting polymer physics issues that we only touched on 
previously. 



The folding transition can be considered as a finite-size 
phase transition involving two or more stable phases: one 
a high entropy denatured state with little structure, al- 
though perhaps collapsed; and the other the low entropy 
folded state. t3 A quantitative way to distinguish among 
these phases is through the magnitude of the fluctuations 
of each monomer about its average position. In the de- 
natured states, the protein explores many conformations, 
and because there is little well defined structure, the fluc- 
tuations of a residue about any particular position are 
relatively large. In the folded state, the conformations 
are much more restricted and can be described as rela- 
tively small fluctuations aboTiUthe localized positions of 
the average native structures These small amplitude 
deviations are reflected in part by the "temperature fac- 
tors" (or Debye- Waller factors) which can be experimen- 
tally obtained from fitting X-ray crystallography data to 
a model structure that allows for these fluctuations. 

The qualitative difference between the liquid and solid 
phases in an ordinary first order crystallization transition 
can also be described by fluctuation magnitudes. The 
thermodynamics and to some extent the kinetics of first 
order phase transitionS|-can be studied using a free en- 
ergy density functional. c2l To analyze the kinetics of the 
transition one introduces an approximate density profile 
that is able to interpolate between the two phases, mod- 
eling the formation of crystallites or droplets. The stable 
phases which are minima of the functional and the criti- 
cal nucleus which is a saddle-point of the functional can 
then be described through the variational parameters of 
the trial density. From the point of view of structure, 
the folded state of the protein is similar to one specific 
minimum of an amorphous solid in so far as it is not in- 
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finitely periodic. One major diff'erence between most in- 
organic solids and proteins is that proteins are polymers 
where topology is important; the residues not only have 
a chemical identity but also a definite sequence. Accord- 
ingly, the uniqueness of the folded structure refers not 
only to the 3-dimensional shape, but also to the specific 
residues localized at the coordinates of the native struc- 
ture. This suggests that in order to apply the free energy 
functional formalism to study the folding of a particular 
protein, one should use order parameters that are local 
in sequence. 

An approach to folding kinetics based on a "site- 
resolved" protein folding free energy functional was pre- 
sented in Ref. In this approach, a variational free 
energy surface is introduced directly through a reference 
Hamiltonian, which provides a good way of approximat- 
ing the density. The resulting free energy surface is pa- 
rameterized by the fiuctuations of each residue about the 
average, folded conformation. When put into practice 
the scheme is very similar in spirit to the density func- 
tional calculations described above. The variational ap- 
proach has also been f^ff^ to characterize the protein 
folding phase diagranjl3' til as well as to study folding 
nucleationt^l without attending to the specifics of a given 
native protein structure. In order to explore general is- 
sues of the role of the polymer chain statistics in folding, 
we specialize our calculation to study a particular pro- 
tein that folds to a known structure. The example we 
choose, the A-rejmsaspr protein, has been much studied 
in the laboratoryEllEd but we will explore how its folding 
routes and free energy profile are changed upon varying 
polymer statistics sometimes using "unphysical" values 
of the backbone parameters in order to gain insight. 



II. GAUSSIAN MODEL FOR A STIFF CHAIN 

Consider the Gaussian approximation to the probabil- 
ity density for the n monomer positions {r.;} of a polymer 
chain 



^'[{r}] ~ exp 



(1) 



where a is a microscopic length scale taken to be the 
mean square distance between adjacent monomers, and 
we have assumed the mean position vanishes. The cor- 
relations of monomer positions in Eq.(|^) are given by 
{r,-r,)/a' = [T-%. 

Different choices of the correlations result in different 
Gaussian models for the polymer backbone. It is most 
natural to define the model chain in terms of the correla- 
tions between the n — 1 bond vectors: = r^+i — r^. De- 
noting correlations between bond vectors by (a.; -aj ) /a^ — 
[r('^)^i]jj, the bond correlations and the monomer posi- 
tion correlations are related by 



(2) 



where M is the (n — 1) x (n) nearest neighbor difference 
matrix 



M 
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-1 1 











-1 1 



(3) 



A simple way to account for chain stiffness is to assume 
there is a fixed angle 9 between adjacent bonds. This 
freely rotating chain modeled has monomer correlations 
that decay exponentially as (a^ -SLi^i) /a^ = g', where g — 
cos 9. Inverting the matrix of bond correlations and iping 
Eq(^) gives the inverse of the monomer correlationscil 



r = 



9 



1 



9^ 



1 



9^ 



(4) 



where is the Rouse matrix for a nearest neighbor 
harmonic chain 



1 -1 
-1 2 -1 
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(5) 



and A is accounts for the "boundaries" at the end of the 
chain 



A = 



1 -1 
-1 1 











1 -1 
-1 1 



(6) 



In the continuum limit, this Gaussian polymer model 
gives the familiar form of wormlike chain that restricts 
the mean separation between adjacent monomers as well 
as the local curvature of the chain (e.g., see Ref. ^ and 
references therein). In this paper, we use the discrete 
representation and identify the monomers to be the a 
carbons composing the polypeptide backbone. With this 
choice, the root mean square separation distance a is the 
typical distance between adjacent a carbons a « 3.8A. 

The other parameter in the chain model defines the 
chain stiffness: as g ^ 0, F = gives the familiar 
correlations of a flexible chain ((a^ ■ aj) = a^Sij), and as 
g — > 1, the correlations correspond to that of a rigid rod 
((a^ ■ aj) = a^). Another measure of the chain stiffness, 
is given by the persistence length = ' SLj)/aJEi 

It is possible to introduce non-uniform stiffness param- 
eters (and hence local persistence lengths) in order to 
model the different flexibilities of the monomers com- 
posing a heteropolymer.c3 For example, the bond cor- 
relations can be extracted from the rotational isomeric 
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states model of Flory,c2l or from-efttiilibrium simulations 
of a detailed polymeric potential.L3L2l Still more complex 
models-jCan be used to model explicit chiral helical ten- 
dencyEd through anisotropic Gaussian correlations. For 
simplicity, we assume here that the chain stiffness is uni- 
form in this paper, so that the persistence length is re- 
lated to the chain stiffness by I w a/(l — g). For pro- 
teins, a reasonable value for the chain stiffness is g = 0.8 
which corresponds tu the persistence length of polyala- 
nine, I = 5a ~ 20A.Ej'E3 In this paper we will also com- 
pare chains of other uniform stiffnesses as well. 

In addition to the Gaussian correlations given by 
Eq.(|l]) and Eq.(|^) for a stiff chain, we also include a 
confining potential that controls the overall size of the 
polymer chain, which is to say it determines the proxim- 
ity to the chain collapse transition. To model a collapsed 
stiff chain, we consider the chain Hamiltonian 



chain 



3 
2^ 



(7) 



The second term controls the degree of collapse of the 
chain through a confining potential where the parameter 
B is conjugate to the radius of gyration of the chain. To 
establish notation for future use, we rewrite Eq.(R) as 



/3Jfchain = ^^r,.[r(^% 



where 



r 



(ch) 



BS, 



(8) 



(9) 



u{r) = 7j exp 

fe=(s,i,l) 



2a2 



(12) 



where (as > ai > ai) are the ranges of the short-, 
intermediate-, and long-range interactions, respectively. 
The intermediate-range term is repulsive (71 > 0) and the 
long-range term is attractive (71 < 0); the intermediate- 
and long-ranged potential parameters are chosen so that 
the sum of these two terms gives a potential well at an 
appropriate distance for contacts in the native structure. 
The short-range term is repulsive (7s > 0) and repre- 
sents the hard core repulsion between residues. (For an 
example of u{r) see Fig. |l]) 

We approximate the free energy surface of the pro- 
tein using a reference Hamiltonian that corresponds to a 
polymer in a non-uniform external field that constrains 
the monomers to lie near their locations in the native 
structure {rf} 



ch am 
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(13) 



The strengths of the harmonic constraints, {C^}, are con- 
jugate to the fluctuations of the polymer about each of 
the native positions. The external constraints in Hq in- 
fluence both the correlations and average positions 
{si} of the monomers composing the reference chain: 



(0)1-1 



(14) 



(15) 



are the inverse monomer correlations of the collapsed stiff 
chain. 



where Sri is the position of the i*^ monomer relative to 
the average 



III. VARIATIONAL FREE ENERGY SURFACE 



The Hamiltonian for our protein model is 



H = H, 



chain 



int ; 



(10) 



where -ffchain is the backbone potential defining the poly- 
meric correlations given in Eq.(^) and Hint is the (2- 
body) interaction potential between distant monomers. 
The interaction between distant monomers are modeled 
by a pair potential u{r) 



Hint - ^ Ejj 



u{\ri 



(11) 



M 



where , the strength of the interaction, depends on the 
identity of the residues i and j. The spatial dependence of 
the interactions between distant monomers consists of an 
attractive well and a repulsive core. For computational 
convenience, we approximate the interaction potential as 
the sum of three Gaussians: 



(r.)o 



(16) 



and r*^°^ is the matrix of coefficients of the quadratic 
terms of 



p(0) ^ p(ch) 



(17) 



From the magnitude of these fluctuations, this reference 
Hamiltonian can distinguish the two stable phases of the 
protein (as described above): the globule corresponds to 
large fluctuations (weak constraints) and folded states 
correspond small fluctuations (strong constraints). 

We consider the variational free energy surface param- 
eterized by the constraint parameters {Ci} 



F[{C}] = -fcsTlogZo + {H- Ho)o, 



(18) 



where Zq = Tr [e ''^f] is the partition function of the 
reference Hamiltonian, and (. . .)o = Tr [. . . e^^^"] /Zq 
denotes the average taken with respect to Hq . Substitut- 
ing the expressions for H and Hq gives the variational 
free energy F = E — ST, where 
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(19) = (u(|r„|))o 



and 



S[{C}]/kB = log + 2^ ^ ((r, - rf (20) 

are the expressions for the energy i?[{C}] and entropy 
S'[{C}] as functions of the variational constraints. 

The averages over Hq in Eq.(|l9|) and Eq.(^o|) can be 
expressed in terms of G and {s^}, because e~^^° is a 
Gaussian distribution. One instructive way to calculate 
the averages is to introduce approximations to the den- 
sity of monomer i, p\{v) = {8{r — rj))o, 
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(21) 



and the pair density between i and j, Piji^) 
(<5(r-(r.-r,)))o, 
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3/2 


2'Ka?6Gij 


exp 



3(r- (s, -Sj)) 
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2a?5Gi 



, (22) 
2Gij . 

These densities depend on the constraint parameters 
{Gi} through G and {s^}. Averages over Hq can be cal- 
culated through p}{r) and p1j{v), for example, 



where SG^j = {{Sr., - 5vjY)o/a'^ = Gu + Gjj 



u r,: 



|))o- dvpUv)u{r). 



(23) 



In this way, the variational free energy can be viewed as a 
density functional with a particular approximation to the 
density that simultaneously incorporates the polymeric 
correlations and the monomeric fluctuations about the 
average positions. 

It is straightforward to calculate the entropy and en- 
ergy in terms of the monomer correlations and mean po- 
sitions. After some manipulation (see Appendix), the 
entropy can be written as 



S[{C}] 



■logdetG 



3 
2^ 
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(24) 



This expression can be interpreted as follows: the first 
term is the entropy of the chain due to polymeric fluctu- 
ations, the second term is the entropy loss of fixing each 
monomer to the average positions, and the last term is 
the entropy of the vibrations about the mean position 
(= (3/2a^) ^ Ci ((5rf)o). Similarly, the pair potential can 
be averaged over Hq to give the energy 



E[{G}]^Y.',,u,i^ 



(25) 



(26) 
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2a2 l + auSGrj 



Finally, we choose to measure the free energy relative to 
the unconstrained chain 



AF[{C}] = Ai?[{C}]-rA5[{C}], 



(27) 



where, for example, AF[{C}] = F[{G}] - F[{G = 0}]. 

The reference Hamiltonian plays such a prominent role 
in the variational theory (and in subsequent calculations 
of folding dynamics) that it warrants further comment. 
Not surprisingly, other Gaussian models have been pre- 
viously introduced to model the polymers with fixed con- 
tacts or crosslinks. A Hamiltonian of the form 



/3^h.c. = 2^ 5](r. - r,+i)2 + C^(r, - r,) 



(28) 



where 



has been, used to study the thermodynamica23i^ and dy- 
namicsEj of polymers crosslinked at the sites specified by 
the pairs [ij]. This harmonic contact Hamiltonian, -ffh.c, 
has akor-bjeen used to model the vibrations of folded pro- 
teinsJlHES with the set [ij] limited to the contacts of the 
native structure, and it was found that the relative mag- 
nitudes of monomer fluctuations agree well with mea- 
sured temperature factors .c3 In contrast to our reference 
Hamiltonian, i/h.c. is translationally invariant, indepen- 
dent of an explicit native structure. While enforcing 
this symmetry has some advantages, the potential well 
is centered at the origin. We note that recently, -ffh.c. 
was used as a mean spherical model for protein folding 
where a nonlinear constraint was added so that the av- 
erage monomer positions would lie on the surface of a 
sphere, preventing the polymer conformation from col- 
lapsing to the origin; the minima of ffh.c. with the added 
imposed condition can yield a meaningful average struc- 
ture.Cj Nevertheless, the mean locations of the residues 
are quite distorted. Consequently, -ffh.c. is not as well 
suited as the Hamiltonian we use to describe the pro- 
tein folding transition where the disordered globule and 
the structured folded state are separated by a barrier 
composed of an ensemble of partially ordered conforma- 
tions. We have, however, investigated i?h.c. as a reference 
Hamiltonian in the variational context. It gives similar 
results to those described here. 



IV. ORDER PARAMETERS AND FOLDING 
ROUTES 

Setting values for the constraint parameters {C} cor- 
responds to selecting an ensemble of conformations speci- 
fied by {s} and G. The energy of a pair is most stabilizing 
when the pair density is contained in the potential well, 
i.e., the mean separation between monomers is within the 
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well and the fluctuations are relatively small. Accompa- 
nying this stabilization, however, is the entropy loss of lo- 
calizing the positions of the pair. In general, when there 
are many non-zero constraints, the entropy loss due to 
localization is given by Eq.(p4|) Lhraiigh the correlations. 
In other free energy functionalalfHla this entropy loss is 
estimated but there are difflculties in considering only 
the entropy loss of individual pairs forming loons-|herause 
the total entropy loss is inherently nonadditivcea^Ej The 
relation of the two approaches is much like the differ- 
ence between the Thomas-Fermi and Hohenberg-Kohn 
estimates of the kinetic ea/argy in quantum mechanical 
density functional theoryH The values of the constraint 
parameters corresponding to the local minima of i^[{Ci}] 
are a compromise between the energy and entropy de- 
crease of forming contacts. Similarly, the constraints 
that correspond to saddle-point of -F[{Ci}] also reflect 
this competition of energy and entropy, because the free 
energy is minimized in all directions except the unstable 
mode along which there is a maximum. 

The configurational ensemble corresponding to a given 
set of constraint parameters {Ci} can also be described 
by density like order parameters that depend on the lo- 
cal mean square fluctuations. For any set of functions 
of the chain positions {Ai[{r}]}, we can define the or- 
der parameter A4{C}] = {Ai)o, as a function of the 
constraints. This relationship can be inverted locally 
provided the Jacobian is nonsingular, det J ^ with 
Jij = dAi /dCj . Since these order parameters are a func- 
tion of the constraints we can then parameterize the free 
energy by F[{A}] = F[{C}] with = C^[{A}]. For 
example, the form of iJo suggests that the local mean 
square fluctuations (related to Debye- Waller factors) 

B, = (<5r2)o = Gua^ (29) 

are a natural set order parameters for the reference 
Hamiltonian. (Indeed, this is what motivated our choice 
of Hq). In studying the dynamics of barrier crossing in 
the companion paper, it will prove useful to consider a re- 
lated but different measure of native similarity. With any 
identification of the order parameters, we can study the 
properties of the free energy in {Ci}-space to describe the 
folding, and then characterize the corresponding ensem- 
bles through structural order parameters as equilibrium 
averages with Hamiltonian Hq. 

We calculate the transition states involved in the fold- 
ing by searching for saddle-poials in -F[{Ci}] using an 
eigenvector-following algorithm.E3 This algorithm is sim- 
ilar to Newton's method for optimization, but involves 
diagonalizing the Hessian matrix, d^F / dCidCj, at each 
iteration. In this routine, the point is updated by step- 
ping in a direction to maximize along the eigenvector 
with the lowest eigenvalue and minimize along along all 
others. To find a minimum, a step is taken to mini- 
mize along all eigenvectors of the Hessian. In order to 
use this algorithm, we need to be able to differentiate 
the free energy with respect to {Ci}, d^F = dF/dCa- 



These derivatives can be easily computed by the chain 
rule using the elementary derivatives d^Gij = —GiaGaj 
and 9a (log det G) = —Gaa- The explicit expressions for 
the derivatives of the energy and entropy are not given 
here; they are straight-forward to derive, and not very 
illuminating. 

The saddle-points and local minima characterize the 
average folding routes from this theory. These average 
pathways are found as follows. The globule and na- 
tive states are identified by the local minima with the 
largest and smallest entropy, respectively. These are easy 
to identify, because the globule is the only stable mini- 
mum at high temperature and the native is the only one 
at low temperature; these minima can be used as the 
initial guesses for the optimization algorithm for incre- 
mental temperature changes until we have these minima 
at the same temperature. Using linear combinations of 
these two sets of constraints as initial guesses, we search 
for a saddle-point. From this saddle-point, we perturb 
the set of constraints {Gi} along the unstable eigenvec- 
tor and use the eigenvector following algorithm with a 
small step size to find the closest minimum. This gives 
two local minima, one for each direction on the unstable 
eigenvector, connected by the saddle-point. This process 
is repeated until the globule and native state are con- 
nected by a series of local minima and saddle-points. We 
identify this connected sequence as the average folding 
route, characterizing the transition states and local min- 
ima that are important in the folding kinetics. We note 
that in the example below for the A-repressor protein, 
only one folding route was found, but this is not a gen- 
eral result of the theory. The same procedure applied to 
the SH3 domain has shown that there may be multiple 
routes (unpublished). 

V. FOLDING ROUTES EXAMPLE: 
A-REPRESSOR 

In this section, we illustrate the variational theory by 
studying the folding of a variant of the A-repressor pro- 
tein. Ag-ssi is a small (80 residue! protein consisting of 
five helices in the native structureE3.Ej This proicin folds 
extremely rapidly following two-state kinetics .EiJ From 
NMR measurements of the folding rate of various mu- 
tants, Oas and coworkers concluded that the structure 
of the transition stete consisted mainly of residues in 
helices HI and H4.c3 In Ref. |l^ we compared folding 
routes calculated from the variational theory with these 
measurements, using a reasonable choice for the persis- 
tence length of the polypeptide chain. We investigate 
here how these results depend on different values of the 
persistence length (or chain stiffness). These studies al- 
low us to see how some recent simplified approaches to 
free energy profiles based-pn assuming complete contigu- 
ous sequence foldingCjKl become more exact as chain 
stiffness increases. 
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A. Model Parameters 

To apply the theory, we need to specify the parameters 
that describe the interaction potential between residues 
and the polymer chain characteristics. The parameters 
of the present paper are the same as those chosen in Ref. 
|l6| , but we describe our rationale for this choice in greater 
detail. For this fast folding protein, we consider a Go 
model for the interactions. This means that the sum over 
residues [ij] in Eq.(|ll|) is limited to the set of contacts 
found in the native structure. This set is defined to be 
pairs of residues {i + i < j) that have /3 carbons (a 
carbons for glycine) distances within a 6. 5 A cutoff in the 
folded structure. A cutoff between 6 — SA is commonly 
used to define contacts in a Go model, though the precise 
value is not generally important. We also include in this 
set residue pairs that are likely to have, hydrogen bonds 
(as determined by the DSSP algorithmE3) but fall outside 
the cutoff. The strength of the interactions for this set 
depends on the residue identities of the pair. We take 
the well depths e^j to be the magnitude of the Miyazawa- 
Jernigan energy parameters reported in Ref. |5^ in units 
of eo = ksT. 

The parameters for the interaction potential u{r) in 
Eq. (|l2|) are constrained by the Ca — Ca distances of the 
set of native contacts. The intermediate- and long-range 
parameters are chosen so that the sum of these Gaus- 
sians has an attractive well that contains all the native 
contacts and has a minimum value u{r*) = — 1 at the 
most probable contact distance r* = 1.6a. The contact 
distribution and potential with (7;, ai; 71, ai) = (9,0.54;- 
6,0.27) are shown in Fig. ^ 

The short-range interactions represent the hard core 
repulsion between the monomers which controls the den- 
sity of the collapsed polymer. Due to the Gaussian chain 
approximation, the pair density given by Eq.(p2|) has 
non-zero density at short distances, and hence even a 
de-localized pair has energy contributions from both the 
repulsive and attractive components of the potential. In 
the model, the short-range Gaussian amounts to an effec- 
tive potential that balances the attractive potential for 
relatively unconstrained polymers. Choosing the repul- 
sion by this criterion is analogous to finding Q solvent 
conditions for the unconstrained polymer (such as the 
globule). To determine a reasonable repulsive potential 
for this model, we consider a one-dimensional approxi- 
mation to the variational free energy by setting all the 
constraints equal Ci = C. As can be seen in Fig. ^, the 
energy as a function of C is monotonically stabilizing if 
7s is small, and has a barrier if 7s large. The parameters 
for the repulsive potential are chosen so that the energy 
is relatively constant for small values of the constraint 
parameter: (7s,q;s) = (25,3.0). (This particular value of 
the strength depends on the width of the short-ranged 
Gaussian which has chosen somewhat arbitrarily). 

The remaining parameter to be specified is B, the 
strength of the diagonal confinement term. This con- 



finement parameter is a small constant that effects the 
fluctuations of unconstrained segments of the chain since 
the constraint parameters are also diagonal terms in the 
inverse correlation matrix. Fig. ^ shows the radius of 
gyration, 

Rh = l/n'Y.((r,-'^^)')o, (30) 

evaluated at the native coordinates (dashed) and an un- 
constrained chain, Ci = 0, (solid) as a function of persis- 
tence length for B = 1Q~^. The unconstrained radius of 
gyration rises rapidly as the persistence length increases 
and saturates to a value less than twice the native radius 
of gyration. The radius of gyration, which in the absence 
of cosfinement approaches a large value in the stiff chain 
liniitEJ , is seen to be limited by the confinement term. 
This tension between local stiffness and confinement is 
responsible for the plateau in Rq- Fig. || also shows 
the the radius of gyration evaluated at the constraints 
corresponding to the globule minimum (O)- For persis- 
tence lengths less than I w 10a the globule constraints 
lead to a smaller radius of gyration than the free chain 
value, and for larger persistence length chains the radius 
of gyrations is somewhat larger. Although it is possible 
to choose values of B in order to set Ra of the globule 
for each chain stiffness, we have chosen to illustrate the 
effects on the polymer conformations by fixing the con- 
finement and independently varying the chain stiffness. 

B. Two-dimensional Illustration 

Because the results of calculations in the full 
variational space (one constraint parameter for each 
monomer) are somewhat complicated to present, we will 
illustrate the model in lower dimensions to give the reader 
an intuitive feeling for the multi-dimensional free energy 
surface and how folding routes are obtained from the 
model. We consider a two-dimensional approximation in 
which we group the protein into two segments and assign 
a variational spring constant to each group. In this ex- 
ample, monomers with index 1 — 50 correspond to Ci and 
the rest correspond to C2; These two groups correspond 
roughly to the helices H1-H3 and H4-H5, respectively. 
This grouping has a loose correspondence to the folding 
route obtained from the full variational calculations dis- 
cussed in the next section. 

The free energy surface shown in Fig. ^ has two dis- 
tinct low energy paths determined by the saddle-points 
connecting the globule (G) and native (N) states. The 
average folding routes, as defined by the path from the 
saddle points to the local minima, are determined by the 
eigenvector-following algorithm. As can be seen in Fig. ^ 
these routes are very close to the steepest descents path. 
Along the two routes from G to N, in Path 1 (dotted 
line) the constraint Ci progressively increase followed by 
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the increase of constraint C2 , whereas along path 2 (sohd 
Une) this order is reversed with C2 increasing before Ci. 

The free energy of these paths is plotted parametri- 
cally versus the energy {AF{Ci,C2) vs. AE{Ci,C2)) in 
Fig. ^, giving a free energy profile where the saddle- 
points appear as local maxima. Path 2 is the relatively 
favored route since it has the lower barrier to folding. We 
note that a rough description of the free energy profile 
can be represented by connecting the stationary points 
of i^(Ci, C2) by straight fines. 

The ensemble of conformations composing the average 
folding route can be characterized by the magnitude of 
the fluctuations of each residue {Bi — {Srj)o) for any 
set of constraints along these paths. Fig. |6| shows the 
monomer fluctuations evaluated at the constraints cor- 
responding to the globule state and the saddle-points of 
both paths. These fluctuations give a description of fold- 
ing consistent with the two-dimensional surface parame- 
terized by the constraints: in path 1 residues in helices 
H1-H3 become structured at TSi followed by H4-H5 at 
TS2, and in path 2 the order is reversed. For a given set 
of the constraints (i.e., a given saddle-point), the fluc- 
tuations are seen to smoothly interpolate between the 
two groups of monomers. The precise shape of the inter- 
face depends on the value of the chain stiffness. A stiffer 
chain would tend to suppress variations as a function of 
sequence index resulting in an more gradual interface. 
As will be seen in the full dimensional calculations, de- 
creasing the chain stiffness allows the magnitude of the 
fluctuations to change more rapidly between successive 
monomers, as expected. 

C. Fine Structure of the Free Energy Profile: 
Multiple Transition States and Folding Routes 

We follow the same analysis outlined in the two- 
dimensional illustration to describe the folding paths cal- 
culated in the full variational space, but the free energy 
profile is evaluated only at the saddle-points and local 
minima rather than as a continuous path. 

In this study, we focus on how the folding routes 
depend on the persistence length of the chain. To 
put various parameters into context, the homopolymers 
polyglyjdjie, polyalanine and polyproline have persistence 
lengthsia I j-£lL (« 2a), 20A(« 5a), and 220A (« 60a), 
respectively.E3'E3 In the freely rotating chain model 
I = a/{l — g), so that these persistence lengths corre- 
spond to the chain stiffness parameters g = 0.5, 0.8, and 
0.98, respectively. Modeling the protein backbone with 
a single uniform chain stiffness parameter, we take the 
chain stiffness of polyalanine to be a reasonable value for 
the protein backbone. 

We report the free energy profile at the folding transi- 
tion temperature, Tf. This is the temperature at which 
the folded and globule ensembles have the same equi- 
librium probability (A_Fg = AJ^n)- Since the entropy 



and energy loss in these states depend on the chain stiff- 
ness, Tf is different for different persistence lengths. (For 
a flexible chain / — 2a, ksTf/eo « 1.2, whereas for a 
very stiff chain I = 20a, ksTi/eo « 2.2). Similarly, the 
unconstrained ensemble (which wc define as the zero of 
the free energy) is dependent on the persistence length. 
Consequently, to compare the folding profiles of chains 
with different persistence lengths, it is convenient to plot 
the free energy profile relative to the globule free energy 
against a normalized energy coordinate 

E = {AE - AE^)/{AE^ - AE^), (31) 

where AE'^ and AE^ are the globule and native energy 
changes, respectively. E is the fractional stabilization 
energy and equals at the globule state and 1 at the 
native state. 

Fig. shows the free energy profile for chain stiffness 
parameters ranging from g — 0.5 to g — 0.95 {I — 2a 
to I — 20a). The free energy profile versus energy for a 
flexible chain (/ = 2a) is shown as the solid curve in Fig. 

The profile exhibits five transition states (saddle- 
points) separated by local minima. We specify the tran- 
sition states sequentially from the globule minimum (G) 
to the native minimum (N), and the minima by the in- 
dex of the adjacent transition state (e.g., the minimum 
between TSi and TS2 is denoted by mini2). The pro- 
flle can be described as a rugged flne structure resulting 
from the different structural ensembles of the local min- 
ima superimposed on a single average free energy barrier. 
The fine structure on the profile is modest in magnitude 
amounting to a stabilization of a "high energy intermedi- 
ate" by at most Ifc^Tt. This fine structure should,not be 
confused with the ruggedness due to frustration,!!! which 
we call "transverse" ruggedness, coming from degrees of 
freedom different from the one plotted; instead, the fine 
structure is better described as "longitudinal" rugged- 
ness (along the reaction coordinate). This is a com- 
mon feature of many, models even those which consider 
only native contactalj and which therefore have perfectly 
funnel-like surfaces. 

Starting with the most flexible chains shown in Fig. 
, as the stiffness increases the magnitude of the barrier 
increases and the the minima separating the transition 
states become relatively still more shallow (the longitu- 
dinal ruggedness diminishes). A flexible chain can take 
advantage of particularly strong contacts while losing a 
relatively small amount of entropy to localize the pair. 
This gives rise to the possibility of relatively stable lo- 
cal minima and lower free energy barriers. Stiffer chains 
must localize larger segments of the chain (on the order of 
the persistence length) resulting fewer distinct local min- 
ima and larger free energy barriers. For the largest chain 
stiffness in Fig. ^ (^ « 9a) the local minima have nearly 
disappeared, leaving a free energy profile with a single 
transition state ensemble. The profile for even stiffer 
chains {I = 10a to I — 20a) is shown in Fig. |^. In 
this parameter range, as the chain stiffness increases fur- 
ther the barrier decreases and the single transition state 
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occurs at a larger fractional stabilization energy (i.e., the 
energy of transition state becomes closer to the native 
energy). When the persistence length is large enough, 
large sections of the chain must be constrained resulting 
in a more folded transition state. Evidently, for very stiff 
chains the transition state is similar enough to the native 
minimum that the barrier decreases. 

The barrier height is plotted as a function of persis- 
tence length in Fig. |^ (solid line) . The maximum barrier 
height occurs for a persistence length I « 10a with an in- 
crease of approximately 70% relative to the barrier height 
for the most flexible chain considered. However, the ap- 
propriate energy scale for the transition is ksTf. In units 
of ksTi, the barrier height is relatively constant for a 
wide range of persistence lengths (dashed line), chang- 
ing only approximately 10% for persistence lengths up to 
I « 12a. The robustness of the barrier height at T{ is 
interesting since the persistence length of real proteins in 
the laboratory is not precisely known. 

Each ensemble of structures along the average fold- 
ing route can be characterized by the local temperature 
factors (Eq.|29|). The fluctuations corresponding to the 
transition states for a flexible chain (/ = 2a) are shown 
as the dotted curves in Fig. The fluctuations corre- 
sponding to the local minima are within those of adjacent 
saddle-points; for clarity, only one local minimum, min23, 
is shown (solid line) . The folding route can be character- 
ized by considering the structure of the transition states 
from the globule to the native structure. The first tran- 
sition state ensemble TSi is described by the ordering of 
helices H4-H5, stabilized by the partial localization of a 
region of helix HI, while residues in helices H2-H3 remain 
de-localized. In the subsequent transition states, helix HI 
becomes progressively more ordered, while helices H2~H3 
continue to have large fluctuations and are the last to or- 
der. This scenario for the folding of A-repressor agrees 
with thcpiriterpretation of kinetic data based on 0-value 
analysis .E3 

The temperature factors for a larger persistence length 
(l = 5a) are shown in Fig. |9|d. The behavior of the 
fluctuations describing the transition state ensembles is 
qualitatively similar to the more flexible chain, though 
the magnitude of the fluctuations of the disordered re- 
gions is larger. Some of the detailed features shown in 
Fig. ||a have been smoothed out since the chain stiffness 
suppresses variations along the sequence smaller than the 
persistence length. For example, the very specific local- 
ization of a helix HI residue shown in the min23 curve 
of the more flexible chain (Fig. ^) has been broadened 
to a larger region for the stiffer chain. These differences 
are rather subtle, but the comparison is useful to illus- 
trate the progression to larger chain stiffnesses. The fluc- 
tuations for the single transition state characteristic of 
larger persistence lengths is shown in Fig. |[:. (Note, 
here the different curves correspond to different persis- 
tence lengths rather than transition states along the same 
folding route.) While the general shape of the curve is 
maintained, the magnitude of the fluctuations become in- 



creasingly more like those of the native state, and in the 
largest persistence length considered / = 20a only the 
end segments of the chain are significantly disordered. 

D. Contiguous Sequence Approximations 

Several recent and apparently accurate estimates of 
folding kinetic parameters have assumed that the tran- 
sition state ensemble can be described by assigning con- 
tiguous segments to be either folded or not and allowing 
the sequence to be P|aased into such fully folded or un- 
folded configurations .EJO As Plotkin et al. argued, such 
a contiguous sequence approximation (CSA) should ap^. 
ply to late transition states where the entropy is 1ow.E2I 
How does the polymer's characteristics determine the 
quality of this approximation? 

Following the contiguous sequence approach, to sim- 
plify the problem we reduce the number of states in the 
variational theory's description of the protein ensembles 
by restricting the conformations to only those in which 
the structured residues are fully native and contiguous in 
sequence (single CSA), or alternatively, we consider Lsmo 
contiguous stretches of native residues (double CSA)l£d . 
For each fixed value of the number of folded residues, iVf, 
we find the minimum free energy configuration (satisfy- 
ing the single or double contiguous constraint). In this 
way we can construct a free energy profile as a function 
of N[. This construction neglects the connectivity of the 
path since the minimum free energy configuration with 
A^f folded residues may not be simply related to that with 
A^f -|- 1. This connectivity is an added complication to the 
approach and can be treated by the methods presented 
Refs. |6[|^. For the purposes of this illustration, we 
neglect this aspect of the approximation. The approach 
outlined here is a great simplification of the more com- 
plete variational formalism since it avoids the relatively 
difficult numerical calculation of finding saddle-points as 
a function of degree of ordering. Still, the number of con- 
figurations needed in the double CSA is quite large, but 
tractable (w 1.7 million for A-repressor). On the other 
hand, these approximations are rather restrictive since 
they neglect partial ordering and provide a less micro- 
scopic characterization of the folding route. 

One issue in comparing the exact variational and the 
CSA results is how to measure the partial order char- 
acterizing the stationary ensembles in a global way to 
comparable the number of fully native residues in the 
CSAs. For a given stationary point, we can compare by 
estimating Nf through the normalized fluctuations 

B,^{B,-Bf^)/{Bf-Bf^), (32) 

where the superscripts G and N denote the fluctuations 
evaluated at the globule and native state, respectively. 
As a rough approximation, we define Nf to be the number 
of residues with Bi > 0.95. 
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The free energy profile from the variational theory 
(dashed) , the single CSA (long-dashed) , and double CSA 
(solid) are plotted as a function of Nf for three different 
persistence lengths in Fig. The barrier heights from 
the simpler approximations are about twice the barriers 
obtained from the variational calculation for each per- 
sistence length. This is consistent with results of the 
other contiguous sequence approaches, where the barri- 
ers are much larger than obtained from jSiiiiulations of 
Go models with pairwise additive forces.oEj^Ej In the 
present comparison, one could also expect this behav- 
ior, since a variational Hamiltonian with fewer degrees 
of freedom generally gives higher barriers (c.f. the bar- 
riers in the two-dimensional illustration Fig. || with the 
full dimensional calculation in Fig. 0). Nevertheless, the 
simpler approximations provide an intuitive explanation 
that agrees qualitatively with the magnitudes of the bar- 
rier heights. Confining the residues to be either folded or 
unfolded (in contiguous stretches) is responsible for the 
larger barrier heights, i.e., partial ordering reduces the 
barrier. This reduction seen in the more accurate cal- 
culation is reminiscent of the way wetting between two 
stable phases in nucleated phase transformations lowers 
nucleation barriers by reducing surface tension, and is 
discussed in the context of protein folding in Ref. ^ ^ 

Considering the free energy profiles for the two approx- 
imations for a flexible chain {I = 2a) plotted in Fig. [l^a, 
the maximum free energy for the single CSA is approxi- 
mately 50% larger than the barrier from the double CSA 
and occurs at a greater value of Nf. Again there is no 
surprise in finding the barrier from the double contiguous 
approximation is lower, since the single contiguous con- 
figurations are a subset of the double contiguous config- 
urations. It is interesting, however, to compare this dif- 
ference as the chain stiffness increases. For a larger chain 
stiffness corresponding to polyalanine (l = 5a) shown in 



_3 

'2a 



Fig. 103, the difference in the barrier height from the 



single and double CSAs decreases, with the single CSA 
barrier approximately 30% larger than the double CSA 
barrier. Increasing the stiffness further {I « 14a), the free 
energy profiles shown in Fig. [lo| c are more similar, dif- 
fering by only 10%. Since the profiles become still more 
similar as the chain stiffness increases (even though the 
conformations in the double CSA are less restricted), this 
comparison suggests that the single CSA more accurately 
describes stiff chain conformations than it does flexible 
chain conformations. 

To make this connection more precise, we consider 
which residues are ordered along the folding route in 
these approximations. In the present treatment, this in- 
formation can be easily represented by a plot of the folded 
regions specified by the monomer index as a function of 
Nf . The structured parts of the chain are indicated by 
the shaded regions in Fig. ^ To characterize the struc- 
ture at the saddle-points of the variational free energy 
surface we consider the Gaussian measure to the native 
structure 



Pi = ^exp 

(l + a^G,0-3/2g^p 



3 Q^(s,-rf )^ 
2a2 l + a^Gi, 



(33) 



This measure of the monomer density relative to the na- 
tive position is the order parameter employed in the com- 
panion paper to study the dynamics of the barrier cross- 
ing. The degree of native structure at the transition state 
can be characterized by the normalized measure 



P^ = iP^-p?)/ipf -P?) 



(34) 



where the superscripts G and N denote Eq. (^3|) evaluated 
at the constraints corresponding to the globule and native 
states, respectively. 

Consider first the folded residues from the double CSA 
for a fiexible chain (/ = 2a) shown in Fig. p]a. Consistent 
with the full variational results, the shaded region clearly 
shows the structure forms between within helices H4-5 
and helix HI. Superimposed on the regions of folded 
residues (from the double CSA) are pi as function of 
monomer index for the four main transition states of the 
folding route of a chain of the same stiffness (from the 
variational theory). The density plots of pi indicate that 
the relatively unfolded regions are indeed more struc- 
tured than the globule value pi = 0. In particular, at 
the interface between the folded and unfolded regions in 
the double CSA, pi obtains intermediate values indicat- 
ing partial ordering. Nevertheless, the folded residues in 
the double CSA agrees qualitatively with the structure 
obtained from the variational saddle-points. In contrast, 
the folded residues in the single CSA do not agree with 
the variational theory very well at all, as shown in Fig. 

Figs.pl]c and pl| d show structured regions along the 
folding routes for a greater chain stiffness I = 5a, the 
value appropriate for polyalanine. For this chain stiff- 
ness, the unfolded region between helix HI and helices 
H4-H5 in the double CSA closes at a smaller value of 
Nf compared with the more flexible chain. In this sense 
the structured residues from the two CSAs are in bet- 
ter qualitative agreement, though the discrepancy be- 
tween the two is still pronounced. This difference be- 
tween the two approximations is greatly reduced for a 
chain with a much larger persistence length (I « 14a) 
shown in Figs.|Tl|e and IIF. While there are still unfolded 
regions between folded regions in the double sequence 
approximation, they only persist for a very limited range 
of Nf. For this persistence length, the variational free 
energy surface has only one transition state as indicated 
by the density plot of pt. The structure determined by 
the saddle-point agrees qualitatively with the structure 
indicated by double CSA, but also with the single CSA 
(because the two are similar). 
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VI. CONCLUSION 

In this paper, we used a variational approach to cal- 
culate the free energy profiles and characterize the tran- 
sition state ensembles applicable when nonnative con- 
tacts can be ignored. Using A-repressor as an illustrative 
example, we investigated the role of chain stiffness on 
the fine structure of the free energy profile. We found 
that increasing the persistence length of the chain tends 
to smooth the free energy profile, making longitudinal 
ruggedness less pronounced. The transition state ensem- 
ble with very stiff chains was found to be more folded 
than the ensemble with more flexible chains. These re- 
sults can be interpreted in terms of the tension between 
taking full advantage of strong local contacts while still 
respecting the bending rigidity of the chain. We also 
found that while the absolute barrier height has a pro- 
nounced maximum as a function of persistence length, 
the barrier scaled by the folding transition temperature 
ksTf was relatively robust over a wide range of persis- 
tence lengths. 

This study allowed us to investigate the applicability 
of simpler contiguous sequence approximations proposed 
recently. Both the free energy profiles and the folded 
residues along the folding routes suggest that the sin- 
gle CSA more accurately describes stiffer chains. Since 
the folded residues in the single CSA is roughly indepen- 
dent of the persistence length, the single CSA sequence 
approximation corresponds to a chain that has a longer 
effective persistence length than is appropriate for most 
natural proteins. The double CSA is able to capture the 
appropriate folded regions near the free energy barrier. 
On the other hand, the neglect of partial ordering leads 
to over-estimates of the absolute barrier height for the 
A-repressor protein. Nevertheless, these approximations 
should be good enough to compute the perturbations of 
the activated free energy found in protein engineering 
experimepis that probe the transition state ensemble {(f)- 
analysis) E3 
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APPENDIX A: 

In this appendix, we outline a derivation of the ex- 
pression for the entropy of the constrained chain given in 
Eq.(||). 

The partition function of the harmonically constrained 
chain is given by 



X Tr exp 



, (Al) 



with A = 3/2a^ and Tr denotes JUidvi. These integrals 
are easy to evaluate by completing squares 

Za - (7r/A)3"/2(detG)3/2 



X exp 



, (A2) 



where G = r'^^)-!^ -y^g ^^^^ ^hat Eq.(|T|) can be used to 
express the second term in the exponent as 

Aj2arf- Gjv"^ ^AY, arf ■ s, (A3) 



in terms of the average monomer positions, {s;}. From 
Eq. (po[) , combining the partition function with 



A^a((r, - rf )2)o = A^a(G..a2 + ^2 ^ 2s, • rf + (rf )2 

i i 

(A4) 

gives the entropy of the constrained chain (ignoring a 
constant factor) 

S[{C}]^^-\ogAetG+^-Y,C^Gu 

i 

+ A^a,s,2-A^arf -s, (A5) 

i i 

This expression is easier to interpret after simplifying 
the last two terms. Inserting the identity, GT^^^ — 1, 
into the last term and using Eq.(|l^) gives 

A J2 C.rf • s. = v4 ^ Cr, • G.^rg • s, 

i ikj 
kj 



(A6) 



In this form, the last two terms of Eq. ( A5 ) can be com- 
bined 

- A^sfc . (rg) - G,6,,) ■ ^-aY^s,- rg'^) . s„ 

kj kj 

(A7) 

where we have used the definition of F^"^ given in Eq. (^7|) . 
Thus, we have 



S[{C}] = I log det G + f ^ G,G,, -^Y^k- r^^^^ 



kj 



kj ■ 



(A8) 



which is Eq.m 



Zq = exp 



-A^C, 
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finite freely rotating chain with chain stiffness g. Coo ~ 
(1 + g)/{l - g); using I = a/(l - g) gives l = {Coo + l)a/2. 

^'^ M. Silow and M. Oliveberg, J. Mol. Biol. 269, 611 (1997). 

''^ S. S. Plotkin, J. Wang, and P. G. Wolynes, Phys. Rev. E 
53, 6271 (1996). 

The configurations in the contiguous sequence approxima- 
tions are specified by the constraint parameters {Ci\: if 
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the i^^ residue is fully folded (unfolded), d is set to the 
value of the corresponding constraint at the native (glob- 
ule) minimum. 

M. P. Eastwood and P. G. Wolynes, J. Chem. Phys., (in 
press) . 

P. G. Wolynes, Proc. Natl. Acad. Sci. USA 94, 6170 (1997). 
^'^ A. R. Fersht, A. Matouschek, and L. Serrano, J. Mol. Biol. 
224, 771 (1992). 




FIG. 1. Interaction potential and A-repressor native con- 
tact distribution. The intermediate- and long-range parame- 
ters are given in the text. 
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FIG. 2. Energy in units of the Miyazawa-Jernigan energy 
scale eo as a function of the constraint, C. All monomers have 
equal constraint (1-dimensional), and chain stiffness g — 0.8. 
7s is the strength of the short-range interaction and the Gaus- 
sian width is Os = 3. The intermediate- and long-range in- 
teraction parameters are the same as in Fig. |^. 
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FIG. 3. Radius of gyration as a function of persis- 
tence length (in units of monomer spacing a) for A-repressor 
(n = 80) and confinement parameter B = 10^'^: no con- 
straints (solid), globule (0)i a^rid native coordinates (dashed). 
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FIG. 4. Contour plot of free energy in units of the 
Miyazawa-Jernigan energy scale eo for the two-dimensional 
surface (see text). The lines indicate the average folding 
routes: Path 1 (dotted), Path 2 (solid). The chain stiffness is 
5 = 0.8 
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FIG. 5. Free energy profile for both paths indicated in Fig. 




monomer index 

FIG. 6. Fluctuations vs. sequence index, {Srf)o = Gua 
where a is the distance between successive monomers, eval- 
uated at the saddle-points for Path 1 (dotted) and Path 2 
(solid) shown in Fig. |^. Fluctuations of the Native (N) and 
Globule (G) are also shown. 
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FIG. 7. Free energy profile vs. normalized energy [Eq.(^, 
for different persistence lengths; (a) g = 0.5, .8, .85, .89. (b) 
5 = 0.9, .92, .94, .95. 
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FIG. 8. Barrier height vs. persistence length. Free Energy 
scaled by Miyazawa-Jernigan energy scale eo (solid), and by 
the folding temperatures fcsTf (dashed). 
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monomer index 

FIG. 9. Fluctuations vs. sequence index, {5r^)o = Gaa 
where a is the distance between successive monomers, of se- 
lected stationary points on the folding route for different per- 
sistence lengths, (a) I = 2a {g = 0.5), (b) I = 5a {g — 0.8), 
(c) I « 10, 11, 13, 20 {g = 0.9, 0.91, 0.92, 0.95). 
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FIG. 10. Free energy vs. number of folded residues for 
persistence lengths (a) I = 2a {g — 0.5), (b) I = 5a {g = .8), 
and (c) I w 14o {g = 0.93) scaled by the Miyazawa-Jernigan 
contact energy scale, eo. The different curves correspond to 
contiguous sequence approximation (sohd), double approxi- 
mation (dashed), and the variational theory (dotted). For 
the variational profile, Nf is defined by the number of residues 
with Bi > 0.95 (see text). 
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FIG. 11. Configuration of native residues vs. number of folded resides persistence lengths for the double (top: a, c, e) 
and the contiguous (bottom: b, d, f) sequence approximations. The columns correspond to the persistence lengths: / = 2a 
(a,b), I = 5a (c,d), I = 14a (e,f). Residues set to native constraints are indicated by the shaded region. The density plot 
corresponds to the normalized native density (Eq.|^ evaluated at the transition states for each persistence length. (Black to 
white represents pi = to 1.) The ordinate for the density plot, Nf, is defined by the number of residues with Bi > 0.95 (see 
text). 
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